Back

Cancer Epidemiology, Biomarkers & Prevention

American Association for Cancer Research (AACR)

Preprints posted in the last 30 days, ranked by how well they match Cancer Epidemiology, Biomarkers & Prevention's content profile, based on 17 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Development and validation of a digital pathology artificial intelligence (DPAI)-based biomarker predicting risk of Gleason grade group reclassification for patients who are candidates for active surveillance

Mabey, B.; Lenz, L. H.; Schiewer, M. J.; Rayford, W.; Muhammad, H.; Huang, W.; Finch, R.; Nakamoto, C.; Kouros-Mehr, H.; Jasper, J.; Basu, H.; Feng, C.; Sharma, A.; Wilding, G.; Roy, R.; Muzzey, D.; Gutin, A.

2026-05-20 oncology 10.64898/2026.05.15.26353328 medRxiv
Top 0.1%
10.1%
Show abstract

Aims Active surveillance (AS) allows selected men with localized prostate cancer to defer curative therapy and reduce treatment morbidity. Conversion from AS to treatment is commonly triggered by Gleason grade group (GGG) upgrading on confirmatory biopsy. We developed and validated a digital pathology artificial intelligence (DPAI) biomarker to predict GGG upgrading in AS-eligible patients. Materials & Methods The DPAI model was trained using histopathology image features from diagnostic biopsies of 998 patients and validated in an independent cohort of 296 patients meeting criteria for AS. Logistic regression estimated the probability of confirmatory-biopsy GGG increase, and feature selection identified the most predictive variables. Results AI-GUR (Artificial Intelligence-Gleason Upgrade Risk) predicted GGG reclassification at confirmatory biopsy (OR 1.60; p=0.0003), and provided information beyond conventional stratification (risk group, CAPRA) and cribriform morphology (all p<0.01). Predicted risks were similar across time from diagnosis (~10-15% to ~85% at 1, 1.5, or 2 years; p for time=0.50), consistent with initial biopsy mischaracterization rather than time-dependent progression. Conclusions AI-GUR provides individualized estimates of confirmatory-biopsy GGG upgrading for AS candidates. Using DPAI may improve shared decision-making by complementing standard clinicopathologic tools and molecular testing using the same biopsy specimen, while informing the likelihood of grade upgrade at confirmation.

2
Allostatic Load in Endometrial Cancer Disparities

Bey, G. S.; Bowen, M. B.; Wu, S.; Boykin, M.; Bernard, L.; Zhang, Q.; Melendez, B.; Celestino, J.; Batsis, J. A.; Sun, C.; Lin, F.-C.; Yates, M. S.

2026-06-11 oncology 10.64898/2026.06.06.26355062 medRxiv
Top 0.1%
8.6%
Show abstract

Background: Endometrial cancer incidence and mortality are increasing, particularly among Black women and for aggressive subtypes. Allostatic load (AL), a composite measure of physiologic dysregulation across metabolic, cardiovascular, and immune systems, varies by racial category and tumor subtype in other cancers. Endometrial cancer is strongly associated with obesity, and it is unknown whether AL scores maintain sufficient heterogeneity to evaluate differences across subgroups or with clinical outcomes. Objective: To describe the performance of AL scoring in endometrial cancer patients and examine associations with tumor characteristics (grade/histology) and survival outcomes. Methods: We evaluated AL among 398 participants newly diagnosed with endometrial cancer. AL score was calculated by assigning 1 point for each ''high-risk'' value (by clinical reference range or distribution-based) for 15 biologic variables for vital signs, anthropometrics, blood-based biomarkers, and medical comorbidities. Results: Distribution-based thresholds for variables were used to preserve heterogeneity in this obesity-dominant context. Overall, 68.7% of Black women had high AL compared to White (56.7%), Hispanic (56.7%), and other race (32.3%) women. Decision tree analyses revealed grade-dependent associations between AL and survival. For women with low-grade tumors, higher AL was associated with poorer overall survival. For high-grade tumors, intermediate AL ([&ge;]4, <8) were associated with shortest overall survival. Black women with low-grade disease experienced shorter progression-free survival regardless of AL. Conclusions: AL scoring maintains heterogeneity despite high obesity prevalence in endometrial cancer. Varying relationships between AL and survival by tumor grade and ethnoracial group suggest cumulative physiologic burden and social/structural factors may jointly shape endometrial cancer disparities.

3
Neighborhood Deprivation and Racial Disparities in Metastatic Prostate Cancer at Diagnosis: A Population-Based Study in Ohio

Payne, J. Y.; Rhodes, S.; Shoag, J.; Rothberg, M.; Le, P.; Cullen, J.; Hartman, H.

2026-06-03 epidemiology 10.64898/2026.06.02.26354723 medRxiv
Top 0.1%
8.4%
Show abstract

Background: Prostate cancer survival varies by stage at diagnosis, and Black men experience a disproportionate burden of advanced disease. We examined whether neighborhood deprivation, measured by Area Deprivation Index (ADI), contributes to racial differences in metastatic presentation. Methods: We conducted a population-based study of men diagnosed with prostate cancer in the Ohio Cancer Incidence Surveillance System from 1996 to 2016. The primary endpoint was distant-stage disease at diagnosis. Generalized additive models assessed nonlinear associations of ADI and diagnosis year with metastatic risk. Inverse probability of treatment weighting (IPTW) models estimated odds ratios comparing Black with White men after sequential adjustment for diagnosis year, age, insurance, and ADI. Results: Among 135,095 men, 18,690 were Black and 116,405 were White. Distant-stage disease occurred in 7.0% of Black men and 5.0% of White men. Black men had higher median ADI (60.9 vs. 47.3). Medicaid-insured men had the highest unadjusted odds of metastatic presentation (OR, 4.68; 95% CI, 4.13-5.31), exceeding uninsured men (OR, 2.91; 95% CI, 2.54-3.34). In IPTW models without age adjustment, the odds ratio decreased from 1.54 to 1.24 after adding insurance and ADI. In age-adjusted IPTW models, the odds ratio decreased from 1.79 to 1.41 after adding insurance and ADI. Generalized additive models showed increasing metastatic risk at higher ADI values and after 2008. Conclusions: Neighborhood deprivation and insurance-related access explained part, but not all, of the excess odds of metastatic diagnosis among Black men. Impact: Integrating ADI into cancer surveillance may improve identification of populations at risk for late-stage diagnosis.

4
Connecting Baseline Immune Exhaustion in Hot Tumors to Oral Cancer Recurrence and Nodal Metastasis

Shaikh, S.; Basu, S.; Hajihosseini, M.; Nandy, S. K.; Moorthy, M.; Arun, I.; Lali, B. S.; Arun, P.; Mukherjee, G.; Pyne, S.

2026-05-30 oncology 10.64898/2026.05.27.26354295 medRxiv
Top 0.1%
8.4%
Show abstract

Background: The use of immune checkpoint inhibitors (ICIs) in the treatment of cancer has rapidly expanded over the last decade. However, there are several knowledge gaps in understanding how tumor cells evade the immune system. There is paucity of data in HPV negative oral cancer, particularly of the gingivobuccal region. Understanding the mechanism of immune system evasion in this cancer is vital for improving patient outcomes. Methods: We characterized the baseline immune milieu of oral cancer using immunohistochemistry (IHC) on whole tumor sections from 124 cases. Tumors were classified as hot or cold and further stratified into high-risk and low-risk groups. High-risk patients included those with lymph node metastasis at diagnosis/recurrence or distant metastasis within 2 years of treatment completion. Patients without these features were categorized as low risk. Validation by RNA-Seq and Joint Enrichment Analysis of Oncogenic and Immunologic Pathways was carried out in a subset of 46 cases. Results: Hot high-risk tumors (by IHC) were distinguished by elevated PD-L1 expression and reduced NK-cell, PD1, and CTLA-4 expression. There was no difference in the expression levels of CD3+, CD8+, granzyme, or perforin compared to hot low-risk tumors, findings that align with the definition of hot tumors. RNA-Seq revealed a gene signature associated with exhausted T-cells in hot high-risk tumors. Gene and pathway analyses identified differential upregulation of isoform-specific TOX, TCF, CXCR, RUNX, IRF, BRD and BCL6 genes, implicating immune cell exhaustion and tumor aggressiveness. Significantly downregulated genes included PDCD1, HAVCR2, ZAP70, and STAT, indicative of a disabled immune microenvironment. These findings support that a state of immune exhaustion in HHR tumors is driven by progenitor exhausted T-cells and terminally exhausted T-cells; independent of PD1-TIM3. Conclusion: These findings suggest that combining TOX/TCF/BCL6 inhibitors with immune checkpoint inhibitors in the adjuvant setting might benefit patients with hot high-risk tumors. Given the results, testing for a targeted exhaustion-related gene panel at diagnosis is recommended for oral cancers to stratify tumors as high-risk or low-risk. Larger validation studies and clinical trials are now warranted.

5
Targeted BRCA1/BRCA2 Sequencing in a Bangladeshi Clinically Referred Cohort Identifies Candidate BRCA1 Loss-of-Function Variants and a Multi-Exon Deletion-Like CNV Signal

Al Sium, S. M.; Banu, T. A.; Goswami, B.; Naser, S. R.; Habib, M. A.; Akter, S.; Ara, M. H.; Al Din, S. M. S.; Nafisa, A.; Nayem, M. R.; Rabbi, M. F. A.; Sarkar, M. M. H.; Khan, M. S.

2026-05-20 oncology 10.64898/2026.05.11.26352643 medRxiv
Top 0.1%
6.7%
Show abstract

Background: Population-relevant BRCA1/BRCA2 data from Bangladesh are scarce, creating challenges for hereditary breast and ovarian cancer variant interpretation, counseling, and follow-up testing. We examined a clinically referred Bangladeshi cohort to characterize assay-derived BRCA1/BRCA2 short variants, sequencing-depth performance, and copy-number findings in a conservative pilot framework. Methods: Twenty-three de-identified blood-derived DNA samples were assessed using a targeted BRCA1/BRCA2 next-generation sequencing workflow. Downstream analysis used assay-generated short-variant, coverage, and CNV outputs, with coordinates reported on hg19/GRCh37. Short variants were evaluated from high-confidence PASS/VCC-H calls, and CNV review incorporated both target-region and amplicon-level copy-number patterns. Results: After removal of four low-VAF review observations, the primary germline-compatible dataset comprised 304 short-variant observations representing 34 unique variants. Both BRCA1 and BRCA2 contributed comparable variant burdens, while the overall profile was mainly composed of missense and synonymous changes. Six sample-specific heterozygous BRCA1 truncating candidates were observed, including five frameshift variants and one stop-gain variant. Protein-level mapping placed these events across the central-to-C-terminal portion of BRCA1. Sequencing depth was consistently high across the targeted regions, with all 4,255 amplicon-sample measurements exceeding 280x and 99.91% reaching at least 500x. Copy-number analysis highlighted one candidate BRCA1 multi-exon deletion-like event involving exons 15-20 in BCSIR-BRCA-21, with unresolved partial exon 14 involvement. Conclusions: This study provides an initial Bangladesh-focused targeted BRCA1/BRCA2 dataset and identifies candidate short-variant and CNV findings for validation. These findings should be interpreted as analytical candidates only and require confirmatory testing and expert clinical curation before any clinical application. The cohort is referral-enriched and should not be used to infer population prevalence.

6
Breast cancer polygenic risk score performance varies by socioeconomic status

Domian, H. I.; Tian, X.; Ong, D.; Hamilton, L.; Shieh, Y.; Musharoff, S. A.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354819 medRxiv
Top 0.1%
6.4%
Show abstract

Background: Polygenic risk scores (PRS) for breast cancer are increasingly used for risk stratification to inform screening and prevention. However, for PRSs to be equitable and clinically useful, they need to perform well across diverse populations. While PRS performance is known to be ancestry-dependent, it is not well understood how environmental context, such as that of socioeconomic status (SES), affects PRS transferability. Here, we assess whether SES, measured via self-reported household income, modifies breast cancer PRS performance and, if so, whether socioeconomic context contributes predictive information beyond genetic risk alone. Methods: We used the US-based All of Us biobank to evaluate how SES impacts breast cancer PRS performance. First, we quantified changes in breast cancer PRS performance by modeling a commonly-cited polygenic score for breast cancer previously described by Mavaddat et al. with SES. We then reestimated the genetic effect sizes of the 3,820 variants from Mavaddat et al. in All of Us with and without income as a covariate. Because social determinants of health affect breast cancer detection and outcomes, we stratified analyses by socially defined populations on the basis of self-identified race and ethnicity. We further stratified individuals whose self-identified race is White (''White'') into three SES groups (high, middle, low) based on self-reported income and re-estimated genetic effect sizes to create SES-specific PRSs. We then applied these PRSs to White participants, the largest group in the study, and to Black or African American (''Black'') and Hispanic or Latino (''Hispanic'') participants, groups underrepresented in breast cancer research. Model discrimination between cases and controls was measured by area under the curve (AUC). Results: We analyzed 163,715 women from the All of Us biobank, which included 8,833 breast cancer cases (6,619 White, 1,178 Black, and 1,036 Hispanic), with relative income available for a subset of these cases (5,525 White, 848 Black, and 566 Hispanic). The ancestry-dependent performance of the breast cancer PRS described in Mavaddat et al. was replicated in All of Us. In Black individuals, this PRS (AUC and 95% CI: 0.576 [0.571, 0.582]) produced a similar increase in AUC as relative income (AUC: 0.573 [0.568, 0.577]) when added to an age-only model. Incorporating income with PRS, age, and genetic PCs 1-3 improved AUC by 0.007 in White Americans and 0.018 in Black Americans (both p < 10-11), while attenuating the contribution of PRS in the full model. PRS performance also varied among SES categories. Notably, PRSs with variant effect sizes that were recalibrated in low-SES White participants performed best in low-SES White participants (AUC: 0.605 [0.583, 0.628]) and Black Americans (AUC: 0.588 [0.586, 0.591]), both better than performance in high-SES White Americans (AUC: 0.579 [0.577, 0.580]) and middle-SES White Americans (AUC: 0.578 [0.569, 0.586]). Conclusion: Socioeconomic context, measured by income, significantly impacts the transferability of a PRS for breast cancer within and among groups defined by self-identified race and ethnicity. Accounting for SES improves PRS performance, most notably in Black Americans and low-SES White individuals.

7
Integrated T-Cell Receptor Repertoire and Tumor Immunogenicity Profiling Reveals Distinct Immunogenomic States in Endometrial Cancer

Aversa, I.; Abatino, A.; Isabello, A.; Gallo, R.; Isdraele, L.; Straface, T.; Zullo, F. M.; Guida, M.; Saccone, G.; Fiume, G.; Venturella, R.; Viglietto, G.; Cuda, G.; Costanzo, F.; Zullo, F.; Palmieri, C.

2026-06-10 oncology 10.64898/2026.06.08.26355191 medRxiv
Top 0.1%
6.3%
Show abstract

Background Endometrial cancer exhibits marked molecular and immune heterogeneity that is only partially explained by established genomic biomarkers. We investigated whether T cell receptor (TCR) repertoire architecture captures complementary dimensions of antitumor immunity beyond conventional molecular classification. Methods Paired tumor and peripheral blood samples from eight patients with molecularly characterized endometrial cancer underwent TCR repertoire profiling. Diversity, clonality, and tumor blood overlap metrics were integrated with genomic variables, including tumor mutational burden (TMB), genomic instability metric (GIM), and POLE status. Principal component analysis and correlation analyses were used to identify major dimensions of repertoire organization. Composite Immune Focusing and Immune Sharing Scores were derived to summarize dominant repertoire patterns. Results The first two principal components explained 70.1% of total repertoire variance and revealed substantial heterogeneity independent of histological subtype. TMB was strongly associated with reduced repertoire diversity and increased clonal dominance, resulting in a robust association with the Immune Focusing Score ({rho} = 0.88, p = 0.004). POLE mutated tumors occupied the extreme end of this focusing continuum. In contrast, genomic instability was associated with increased tumor blood repertoire overlap and preserved diversity, reflected by a strong correlation between GIM and the Immune Sharing Score ({rho} = 0.76, p = 0.027). The two immune scores showed minimal correlation with each other ({rho} = -0.24, p = 0.57), indicating that they capture largely independent aspects of immune organization. Conclusion Integrative analysis of TCR repertoire architecture and tumor genomics identifies distinct immunogenomic states in endometrial cancer that are not fully captured by conventional molecular classification. If validated in larger cohorts, immune focusing and immune sharing metrics may provide complementary biomarkers for patient stratification and immunotherapy-oriented precision oncology

8
Retrospective cohort study extracting coexisting background breast-lesion features from stage I-III invasive breast cancer

Lim, R. J. Y.; Nitar, P.; Lau, K. W.; Leong, L. C. H.; Lim, G. H.; Tan, V. K. M.; Tan, B. K. T.; Tan, E. Y.; Goh, S. S. N.; Hartman, M.; Wong, F. Y.; Li, J.; Joint Breast Cancer Registry,

2026-05-22 oncology 10.64898/2026.05.19.26353633 medRxiv
Top 0.1%
6.0%
Show abstract

Background Background breast features are frequently noted in pathology reports alongside invasive breast cancer but rarely factor into prognosis or treatment decisions. Their relationship to tumor characteristics and patient outcomes remains incompletely characterised. Methods We conducted a retrospective cohort study of 7,603 patients with Stage I-III invasive breast cancer (diagnosed 1991-2022, age <80 years) from the Joint Breast Cancer Registry in Singapore. Natural language processing (NLP) was applied to 9,754 free-text pathology reports to extract co-existing background breast features, with accuracy validated by dual-reviewer assessment of 200 reports. Unsupervised hierarchical clustering grouped extracted features into three categories. Associations with tumor characteristics were assessed by multinomial logistic regression, and ten-year overall survival by Cox proportional hazards models (median follow-up 9.6 years; 620 deaths). Results Here we show that NLP-based extraction of background breast features from routine pathology reports achieves an accuracy of over 90% across features. Lobular neoplasia and benign proliferative changes are associated with less aggressive tumor characteristics, whereas early neoplastic and papillary lesions are more prevalent in HER2-enriched and luminal B tumor subtypes. Benign proliferative changes are associated with better survival in age- and year-adjusted models (hazard ratio 0.91, 95% CI 0.86-0.97), but this association is attenuated after adjustment for stage and subtype. Conclusions NLP-enabled extraction of background breast features from pathology text is feasible at scale. These features reflect tumor biology but do not independently add prognostic information beyond established clinical variables.

9
Documented clinical genetic testing among carriers of hereditary breast and ovarian cancer variants: Ancestry and socioeconomic disparities in the All of Us research program

Yerukala Sathipati, S.; Scott, H.

2026-06-10 oncology 10.64898/2026.06.09.26355262 medRxiv
Top 0.1%
5.0%
Show abstract

Importance: Hereditary breast and ovarian cancer (HBOC) variant carriers benefit from risk-reducing interventions, but only if identified. The extent to which carriers are clinically recognized, and whether recognition is equitable across diverse populations, is poorly characterized in a single large U.S. cohort. Objective: To estimate P/LP HBOC carrier prevalence across genetic ancestry groups, quantify documented clinical genetic testing among carriers, and evaluate ancestry and socioeconomic disparities in testing. Design, Setting, and Participants: Cross-sectional analysis of the All of Us Research Program Controlled Tier (Curated Data Repository v8/C2024Q3R9), comprising participants with short-read whole genome sequencing and linked electronic health record (EHR) and survey data. Carriers were ascertained from research genomic data independent of clinical testing. Exposures: Genetically inferred ancestry (African [AFR], Admixed American [AMR], East Asian [EAS], European [EUR], Middle Eastern [MID], South Asian [SAS]); self-reported household income and educational attainment. Main Outcomes and Measures: (1) Carrier prevalence with Wilson 95% CIs; (2) documented clinical genetic testing (procedure codes) among carriers; (3) adjusted odds of documented testing among women, by ancestry, before and after socioeconomic adjustment, using multivariable logistic regression. Results: Among 414,830 participants, P/LP HBOC carrier prevalence was 1.42% (95% CI, 1.38-1.45) overall and similar across ancestry groups (AFR 1.24%, AMR 1.32%, EAS 1.19%, EUR 1.52%, MID 1.68%, SAS 1.33%; overlapping CIs). Among 250,071 women in the testing analysis, documented clinical genetic testing was rare: only 74 of 5,878 carriers overall (1.3%) and 59 of 3,572 European-ancestry carriers (1.7%) had a documented test, with counts below reportable thresholds in all other ancestry groups. African-ancestry women had lower adjusted odds of documented testing than European-ancestry women (Model 1 adjusted odds ratio [aOR], 0.32; 95% CI, 0.27-0.39), an association that attenuated but persisted after adjustment for income and education (Model 2 aOR, 0.48; 95% CI, 0.40-0.58; P < 0.001); Admixed American women also had reduced adjusted odds (aOR, 0.71; 95% CI, 0.61-0.84). Lower income and lower education were independently and dose-dependently associated with lower testing odds (income <$25,000 aOR, 0.46; high-school education aOR, 0.54). Conclusions and Relevance: High-risk HBOC variant carriers are present across all ancestry groups at similar frequencies, yet documented clinical genetic testing was disparate in the different ancestry groups. African-ancestry women experience a testing gap that is not fully explained by socioeconomic position, implicating structural barriers in access and referral. Population-level strategies that decouple carrier identification from current referral pathways may be required to close this gap.

10
Development and Validation of a Machine Learning Model to Predict Prognosis in Patients with Advanced Head and Neck Cancer

Zhang, K.; Gao, L.; John, D.; Li, W. T.; Hogarth, M.; Coffey, C. S.; Ongkeko, W. M.

2026-05-28 oncology 10.64898/2026.05.27.26354194 medRxiv
Top 0.1%
4.8%
Show abstract

Importance Prognostic tools beyond staging are needed to guide treatment and counseling in head and neck squamous cell carcinoma (HNSCC). Objective To develop and externally validate a machine learning model predicting survival in advanced HNSCC using routinely collected clinical and biomarker data. Design, Setting, and Participants Retrospective, multi-institutional cohort study including 2,385 patients with stage III-IV HNSCC diagnosed from 2012-2022 in the University of California Health Data Warehouse (UCHDW). Patients were randomly split into training (n = 1,908) and test (n = 477) sets. Partial external validation used 7,749 patients from the Surveillance, Epidemiology, and End Results (SEER) registry (2010-2020). Exposures Demographic, tumor, treatment, comorbidity, and biomarker variables recorded at or before diagnosis. Main Outcomes and Measures The primary outcome was all-cause mortality within 70 months. Cox proportional hazards models included all predictors. Discrimination was assessed with Harrell's concordance index (C-index), calibration with predicted vs observed survival, and stratification with Kaplan-Meier curves. A Random Survival Forest (RSF) was trained for benchmarking and interpretability using Shapley Additive exPlanations (SHAP). Results Among 2,385 patients in UCHDW (median age, 63 years; 29.0% mortality), the Cox model achieved a C-index of 0.735 in the internal test set. Risk quartiles showed clear separation on Kaplan-Meier curves (log-rank p < 0.0001). In the SEER cohort (n = 7,749), where only demographic, staging, subsite, and treatment variables were available, the reduced Cox model achieved a C-index of 0.688, with calibration showing modest underestimation of survival in high-risk groups. Age, T stage, Charlson Comorbidity Index, neutrophil-to-lymphocyte ratio, and platelet count were among the strongest predictors, while surgery was associated with improved survival. The RSF achieved a C-index of 0.758 internally, with SHAP highlighting nonlinear effects of albumin, BMI, and inflammatory markers. Conclusions and Relevance A machine learning model using routine clinical and biomarker data demonstrated good prognostic performance in advanced HNSCC, with partial external validation. Such approaches may support individualized survival estimates, risk stratification, and treatment discussions, but broader validation is required before clinical adoption.

11
Breast cancer over-diagnosis due to mammography screening - A long-term follow-up population study of BreastScreen Norway

Heggland, T.; Vatten, L. J.; Opdahl, S.; Weedon-Fekjaer, H.

2026-06-03 epidemiology 10.64898/2026.06.02.26354696 medRxiv
Top 0.1%
4.2%
Show abstract

Objectives Estimates of breast cancer over-diagnosis related to mammography screening varies substantially. Over-diagnosis is commonly defined as cases that would not have been detected during the persons remaining lifetime in the absence of screening. We here aim to quantify over-diagnosis in the population-based BreastScreen Norway mammography screening program using long-term follow-up and more detailed modeling than previous studies. Setting We applied data on Norwegian screening patterns and breast carcinoma incidence for the period 1987-2019, covering women aged 49-84 years, leveraging the gradual implementation of the organized biennial BreastScreen Norway screening program for women aged 50-69 during 1995-2005. Methods Using an extended age-period-cohort model, we estimated excess lifetime risk of invasive breast cancer and ductal carcinoma in situ in the presence of program screening, as an indicator of over-diagnosis among screen-detected cases. Results Lifetime risk of breast carcinomas was 6.6% (95% confidence interval 2.5% to 10.7%) higher for invited than for non-invited women. This indicates that 18% (95% confidence interval 7.3% to 28.0%) of screen-detected cases may be over-diagnosed, and that approximately one in 86 (95% confidence interval 54 to 210) screened women were over-diagnosed during their screening period. Using effect estimates from previous studies, we estimated that approximately three women are over-diagnosed for every breast cancer death prevented by screening, and that 87% of over-diagnosed tumors might grow extremely slowly. Conclusions Over-diagnosis related to mammography screening is a considerable problem, but its extent may be smaller than reported in some previous studies. Most over-diagnosed tumors likely grow very slowly.

12
Enfortumab vedotin-induced cutaneous toxicities and their association with survival in urothelial carcinoma

Lee, E.; Karagenova, R.; Lu, C.; Farokh, P.; Azin, M.; Repetto, F.; Jobbagy, S.; Nazarian, R. M.; Reynolds, K.; Demehri, S.; Saylor, P. J.; Fuksman, L.; Semenov, Y. R.

2026-05-21 oncology 10.64898/2026.05.19.26353579 medRxiv
Top 0.2%
3.6%
Show abstract

Importance: Enfortumab vedotin (EV) is an antibody-drug conjugate approved for the treatment of locally advanced or metastatic urothelial cancer (la/mUC). Cutaneous adverse events (cAEs) are common during EV therapy, with prior studies suggesting an association between EV-related cAEs and improved survival; however, there is insufficient data to delineate the survival benefit of EV-induced cAEs from those associated with concurrent immune checkpoint inhibitors (ICIs). Objective: This study aims to evaluate the association of EV-induced cAEs and survival, and to characterize the timing and morphology of EV-induced cAEs. Design: We conducted a multi-institutional retrospective study of patients with la/mUC treated with EV between 2020 and 2025. Setting: Multicenter academic referral center. Participants: A total of 449 EV-treated patients were included. Patient characteristics were extracted manually, and likelihood scoring was used to attribute cAEs to either EV or other etiologies. Exposure: EV treatment. Main Outcomes and Measures: We estimated progression-free (PFS) and overall (OS) survival using Kaplan-Meier method. Multivariable time-varying and landmark Cox regression models were used to evaluate associations between EV-induced cAE and survival. Sensitivity analyses were performed at landmarks from 15 to 105 days. Results: Of 449 patients, 206 (45.9%) developed a cAE; 39 (18.9%) were high-grade and 127 (61.7%) were attributed to EV. The most common cAEs were pruritus (41.3%), unspecified and desquamating dermatitis (37.3%), and morbilliform dermatitis (27.7%). Across all treatment groups, survival was longer in patients with EV-induced cAEs. Developing an EV-induced cAE was protective across all examined landmark times, with hazard ratio (HR) 0.60 (95% CI: 0.43-0.82, p<0.001) for PFS and HR 0.46 (95% CI: 0.31-0.67, p<0.001) for OS at primary landmark time of 30 days. Early-onset EV-induced cAEs were protective at all landmark times and high-grade EV-induced cAEs were not associated with worse survival. Conclusions and Relevance: EV-induced cAEs were independently associated with improved PFS and OS in patients with la/mUC, even after accounting for immortal time bias and ICI exposure. Distinguishing EV-induced cAEs from other etiologies in timeline and morphology may help guide oncology and dermatology management.

13
Deep Learning Spatial Profiling of CD103+CD8+ T Cells and Survival in Rectal Cancer After Neoadjuvant Chemoradiotherapy

Abe, T.; Yamashita, K.; Nagasaka, T.; Fujita, M.; Ueda, Y.; Miyake, S.; Ito, R.; Adachi, Y.; Ando, M.; Tsuneki, T.; Okazoe, Y.; Konaka, R.; Takahashi, T.; Kagiyama, H.; Tachibana, T.; Imai, M.; Yoshida, T.; Saito, M.; Mukohyama, J.; Kanayama, K.; Koma, Y.-I.; Otowa, Y.; Hasegawa, H.; Ikeda, T.; Koterazawa, Y.; Aoki, T.; Harada, H.; Urakawa, N.; Goto, H.; Kanaji, S.; Yanagimoto, H.; Matsuda, T.; Takamura, S.; Yamashita, T.; Sasaki, R.; Fukumoto, T.; Kakeji, Y.

2026-05-28 oncology 10.64898/2026.05.26.26353629 medRxiv
Top 0.2%
3.5%
Show abstract

Background: CD8+ tumor-infiltrating lymphocytes (TILs) are established prognostic markers in colorectal cancer, yet the clinical significance of CD103+CD8+ tissue-resident memory-like (TRM-like) T cells in locally advanced rectal cancer (LARC) after neoadjuvant chemoradiotherapy (NACRT) remains unknown. Methods: We quantified CD8+ and CD103+CD8+ T-cell densities in stromal and intratumoral compartments of post-NACRT resection specimens from 40 LARC patients using Cu-Cyto, a deep learning-based imaging cytometry platform. Associations with survival, pathological response, and adjuvant chemotherapy (AC) were examined. Treatment-induced T-cell dynamics were assessed in paired pretreatment biopsies and post-NACRT resections (n = 9). Results: High stromal CD103+CD8+ density independently predicted better 5-year RFS (67.4% vs. 12.1%, p < 0.001) and OS (80.0% vs. 26.6%, p = 0.016); intratumoral density showed no prognostic significance. Pathological response correlated with stromal CD8+ but not CD103+CD8+ density. Paired analysis revealed a selective non-expansion of the CD103+ subset: stromal CD8+ T cells increased significantly after NACRT while CD103+CD8+ density remained unchanged. AC may preferentially benefit patients with low stromal CD103+CD8+ density. Conclusions: Stromal CD103+CD8+ T-cell density is a robust independent prognostic biomarker in rectal cancer after NACRT that appears to reflect pre-existing rather than treatment-induced immunity. Given its stability across NACRT, pretreatment biopsy assessment may provide equivalent prognostic information, with potential implications for patient stratification before treatment initiation.

14
Contextualizing the Utility of Polygenic Risk Scores using Absolute Risk Models in Diverse Ancestry Populations

Chatterjee, N.; Martina, F.; Kachuri, L.; Natarajan, P.; Witte, J.; Huo, D.

2026-06-04 genetic and genomic medicine 10.64898/2026.06.03.26354842 medRxiv
Top 0.2%
3.5%
Show abstract

Polygenic risk scores (PRSs) are emerging as powerful tools for quantifying inherited risk for common diseases and, in some cases, are approaching clinical implementation. A major concern for PRS implementation is their limited accuracy in non-European populations, particularly in those of African ancestry. However, past evaluations have focused on metrics such as relative risk or AUC, which do not capture background risk arising from contextual factors. We introduce a novel measure of variable importance, the conditional average derivative estimator (CADE), to evaluate PRS utility across diverse contexts and populations within absolute risk models that integrate PRSs with other relevant risk factors. We illustrate this framework by integrating PRSs for breast and prostate cancer within age-specific absolute risk models for incidence and mortality fit using individual-level data from the All of Us Research Program with inputs from the National Cancer Institute SEER cancer registry. Our projections show that although the PRSs are known to have the lowest discriminatory accuracy in African Americans (AA), there are contexts in which they provide greater utility, such as for the stratification of prostate cancer risk and mortality, where the CADE values for AA were 2- and 7-fold higher than for European Americans. These findings suggest that conclusions about the limited clinical utility of PRS in non-European populations may be premature and underscore the need to quantify PRS risk-stratification utility at the absolute-risk level, while accounting for disease onset, survival, and broader health and economic factors.

15
Cross-Cancer Profiling of Cadherin-1 Reveals Context-Dependent Epithelial-Mesenchymal Transition Decoupling, Immune Heterogeneity, and Prognostic Variability in Epithelial Cancers

Rahman, M. A.; Bellah, S. F.; Rahman, M. M.

2026-05-27 cancer biology 10.64898/2026.05.22.727338 medRxiv
Top 0.2%
3.0%
Show abstract

BackgroundCDH1 (E-cadherin) is a key epithelial adhesion molecule traditionally associated with tumor suppression and epithelial-mesenchymal transition (EMT). However, its roles across cancers remain incompletely understood, particularly within multilayer regulatory contexts involving genomic, epigenetic, transcriptional, and immune mechanisms. MethodsCDH1 expression, survival associations, EMT-correlated gene profiles (VIM, SNAI1, ZEB1), immune infiltration patterns, immune checkpoint correlations (PDCD1, CD274, CTLA4), promoter methylation, and genomic alterations were assessed across five epithelial cancers, breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), lung adenocarcinoma (LUAD), ovarian cancer (OV), and stomach adenocarcinoma (STAD). Cross-platform validation was performed using TCGA/GDC datasets, GEPIA2, UALCAN, TIMER, KM Plotter, cBioPortal, and g:Profiler. ResultsCDH1 was overexpressed but showed variable prognostic significance; higher expression predicted better survival in COAD, LUAD and STAD, worse survival in BRCA and had no impact in OV. Classic inverse relationships between CDH1 and VIM or ZEB1 were evident only in STAD, and SNAI1 showed no consistent association. Immune infiltration patterns were tumor-specific, ranging from cytotoxic T-cell dominance in LUAD to macrophage-rich profiles in OV; immune checkpoint correlations were similarly context-dependent. Co-expressed genes were enriched for endomembrane transport rather than adhesion pathways. Promoter methylation patterns varied by cancer, whereas genomic alterations of CDH1 were rare. ConclusionsCDH1 does not function as a universal epithelial or EMT marker across epithelial cancers. Instead, its associations with EMT, immune contexture, methylation, and prognosis are context-dependent, supporting a model of CDH1 as a heterogeneous regulator of epithelial plasticity. These findings challenge single-function interpretations and support cancer-specific CDH1 evaluation in translational research.

16
Improved prostate cancer prediction by combining Prostate-Specific Antigen (PSA) test results with Genetic Risk Scores (GRS/PRS)

Lu, J.; Chen, G.; Merriel, S. W. D.; Weedon, M. N.; Murray, A.; Bailey, S. E. R.; Green, H. D.

2026-05-18 genetic and genomic medicine 10.64898/2026.05.14.26353195 medRxiv
Top 0.2%
3.0%
Show abstract

Background: Prostate cancer is the second most common cancer in men worldwide. The Prostate Specific Antigen (PSA) blood test is widely used for prostate cancer detection but suffers from high false-positive rates (up to 80%). Genetic risk scores (GRS/PRS) have a similar performance to PSA testing in predicting prostate cancer risk. Method: GRS269 for prostate cancer was derived using 269 known risk variants and applied to UK Biobank participants. We assessed whether GRS269 improved power to predict prostate cancer diagnosis on top of age and pre-prostatectomy PSA level among 17,380 cases. Longitudinal PSA measurements were processed as median, first, last (most recent), and random PSA. All models were adjusted for age. Results: Across all PSA measures, the integrated model combining GRS269, PSA, and age consistently outperformed models using GRS269 or PSA alone. The highest predictive performance was observed using the last PSA value combined with GRS269 (AUC = 0.82, 95% CI: 0.81-0.82), compared to GRS269 alone (AUC = 0.70, 95% CI: 0.68-0.72) or PSA alone (AUC = 0.73, 95% CI: 0.70-0.75). Conclusion: Combining genetic risk with PSA and age improves prostate cancer risk prediction in a population setting. These findings highlight the potential clinical implications of integrating GRS will enhance early prostate cancer prediction pathways in primary care.

17
Basal gland localization and focal distribution of OLFM4-expressing cells in increasing severity of gastric intestinal metaplasia

Sathe, A.; Meka, R.; Geier, B.; Long, R.; Wong, C.; Han, S.; Shen, J.; Amieva, M. R.; Ji, H. P.; Huang, R. J.

2026-05-20 cancer biology 10.64898/2026.05.14.725297 medRxiv
Top 0.2%
2.8%
Show abstract

Patients with gastric intestinal metaplasia (GIM), a precancerous lesion, are at high risk for progressing to gastric cancer. Identifying these patients is critical to enable gastric cancer interception. Current approaches rely primarily on histologic evaluation of GIM severity and extent, which may be improved by incorporating molecular features that distinguish high-risk lesions. Our prior single-cell and spatial transcriptomics study identified differentially expressed genes associated with the highest-risk category of GIM. They included ANPEP expressed in enterocytes and CPS1 and OLFM4 expressed in intestinal stem-like or progenitor cells. We evaluated the protein expression and localization of these three markers to understand the cellular features associated with GIM risk and their spatial distribution within metaplastic tissues. Using multiplex immunofluorescence, whole slide image analysis and confocal microscopy, we examined protein expression from 100 tissue biopsies annotated for metaplasia severity using the Operative Link on Gastric Intestinal Metaplasia Assessment (OLGIM) system. Tissue samples included control gastric tissue, GIM, dysplasia and adenocarcinoma. Quantitative whole slide image analysis demonstrated that CPS1 expression had a modest association with disease severity. Although ANPEP was strongly associated with GIM severity, it was also frequently expressed in stromal regions outside epithelial glands. In contrast, OLFM4 expression was largely restricted to epithelial glands and showed a strong association with increased OLGIM severity. These OLFM4-positive epithelial cells were present in discrete glandular foci that expanded with increasing severity of metaplasia. Within individual metaplastic glands, OLFM4 expression was highest at the gland base with decreased expression toward the gland surface. Overall, these findings identified OLFM4 as a protein marker associated with high-risk GIM. The spatial organization of OLFM4-expressing cells at the base of metaplastic glands and their focal expansion within tissues suggest the presence of a stem cell-like epithelial compartment that may contribute to the progression of GIM towards gastric cancer.

18
Liquid Biopsy of HPV Cell-Free DNA Enables Blood-Based Early Detection and Molecular Stratification of HPV-Associated Cancer and Precancer Stages

Wang, Q.; Eldfors, S.; Lee, S. S.; Das, D.; Al-Inaya, Y.; Lumaj, G.; Epstein, E. T.; Shukla, S.; Ricart, E.; Dhillon, H.; Lake, J.; Hirayama, S.; Adalsteinsson, V. A.; Drage, M. G.; Gulhan, D. C.; Davis, B. T.; Faden, D.

2026-05-14 oncology 10.64898/2026.05.11.26352922 medRxiv
Top 0.2%
2.1%
Show abstract

Liquid biopsies targeting circulating tumor DNA enable noninvasive cancer detection but lack sensitivity in pre- and early-cancer stages, where clinical benefits would be greatest. Human papillomavirus (HPV) causes six cancer types, accounting for 5% of all cancers worldwide. Targeting HPV cell-free (cf)DNA offers a compelling opportunity to overcome current liquid biopsy constraints due to its unique tumor-specific origin, lack of sequence homology to the human genome, and the high viral-to-human copy ratio per cell. Utilizing HPV-associated anal cancer and precancer as a model, here we applied a custom, multi-feature HPV whole-genome liquid biopsy to biobanked and prospective screening cohorts spanning the HPV infection-precancer-cancer continuum. HPV cfDNA was detected years before cancer diagnosis and as early as the infection stage, with increasing detection as stages advanced. Genomic hallmarks of HPV malignancy, including HPV integration, PIK3CA mutations, and 3q amplification, were detected exclusively in cancer, while precancers exhibited distinct HPV genotypes. Fragmentomics analysis of HPV cfDNA revealed stage-informative signatures reflecting viral epigenetic changes during carcinogenesis. A unified classifier incorporating genomic and fragmentomics features achieved a mean AUC of 0.77 for identifying cancer and high-grade precancer, stages requiring clinical intervention. Together, these findings demonstrate the feasibility of blood-based screening and molecular risk stratification for HPV-associated cancer and precancer. TeaserProfiling blood HPV cell-free DNA detects cancer years early and distinguishes precancers needing intervention from surveillance

19
Formalising Limits of Circulating Tumour DNA Detection: A Signal Detection Framework for Clinical Threshold Specification

Walinjkar, A.

2026-06-10 oncology 10.64898/2026.06.08.26355204 medRxiv
Top 0.2%
2.0%
Show abstract

Background: Circulating tumour DNA (ctDNA) liquid biopsy is now established across oncology for early cancer detection, minimal residual disease surveillance, and treatment monitoring. Detection thresholds for all current ctDNA assays are derived empirically through receiver operating characteristic analysis on training cohorts - a statistically valid but theoretically uninformed approach that does not specify the minimum detectable tumour fraction given assay technical characteristics, nor identify when increasing sequencing depth ceases to provide additional clinical information. Methods: We model ctDNA detection as a binary hypothesis testing problem with Binomial-distributed mutant allele counts against a sequencing error noise floor. The Neyman-Pearson lemma is applied to derive the uniformly most powerful detector and the minimum detectable tumour fraction in closed form. The sequencing assay is modelled as a binary symmetric channel and Shannon channel capacity is calculated. Empirical validation uses n=61 data points extracted from five published peer-reviewed analytical validation studies across five independent institutions in the US and EU (2018 - 2025): Yu et al. 2022, Stetson et al. 2018, Frydendahl et al. 2023, Northcott et al. 2024, and Cheng et al. 2025. Results: The minimum detectable tumour fraction is derived in closed form as f_min approximately equal to (z_alpha + z_beta) multiplied by the square root of (epsilon divided by N), where N is sequencing depth, epsilon is the platform error rate, and z_alpha, z_beta are standard normal quantiles at the specified false positive and false negative rates. Shannon channel capacity is C = 1 minus H(epsilon) bits per read, where H(epsilon) is binary entropy. Empirical validation yields 84.3% agreement for single-locus assays. Discordance for multi-locus tumour-informed assays (NeXT Personal, duplex WGS) is consistent with the single-locus model scope and identifies the principal theoretical extension required. Conclusions: This framework provides the first formal Neyman-Pearson optimality proof for ctDNA detection, a closed-form detection limit, and a platform-independent efficiency metric for NHS and regulatory standardisation. Keywords: circulating tumour DNA; liquid biopsy; Neyman-Pearson detection; Shannon channel capacity; sequencing depth; limit of detection; minimal residual disease; signal detection theory

20
Integrating enriched case data from national laboratory testing with population-based case-control analyses: a novel statistical likelihood-ratio methodology for PS4 applied to 325,345 breast cancer cases and 671,006 controls

Allen, S.; Rowlands, C. F.; Garrett, A.; Couch, F.; Richardson, M. E.; Pesaran, T.; Pethick, J.; Lavelle, K.; McRonald, F.; Vernon, S.; Torr, B.; Loong, L.; Aungraheeta, R.; Durkie, M.; Burghel, G. J.; Callaway, A.; Robinson, R.; Field, J.; Frugtniet, B.; Palmer-Smith, S.; Grant, J.; Pagan, J.; McDevitt, T.; Snape, K.; Hanson, H.; McVeigh, T.; Loveday, C.; Jones, M.; Hardy, S.; Turnbull, C.; CanVIG-UK,

2026-05-17 genetic and genomic medicine 10.64898/2026.05.13.26353095 medRxiv
Top 0.3%
1.9%
Show abstract

Background: For many evidence criteria within v3.0 of the ACMG/AMP guidelines, methodologies have been developed to empower their use outside the stipulated evidence strengths. However, no such methodology has been established for case-control data (PS4). With the release of large-scale unselected case-control datasets and expansion of nationally-collected laboratory datasets enriched for pathogenic variant carriers, there is potential to combine datasets across ascertainment contexts in a more quantitative manner using novel likelihood ratio tools. Methods: Using our published PS4-LR-Calculator, we calculated a combined log likelihood ratio (PS4-LLR) across five datasets (three unselected, and two enriched), and estimated enrichment of pathogenic variants in clinically-ascertained laboratory data using truncating variant prevalence. Results: Data were combined for 10,817 missense variants from 325,345 female breast cancer patients and 671,006 controls of Western European ancestry for five breast cancer susceptibility genes (BRCA1, BRCA2, PALB2, ATM, CHEK2). A combined LLR was produced for 4,690 missense variants; 927 variants received evidence towards pathogenicity (LLR[&ge;]1), and 3,242 received evidence towards benignity (LLR[&le;]-1). Conclusion: This flexible, variant-level methodology combines nationally-collected 'enriched' datasets with unselected case-control cohorts, expanding the available information for case-control analysis, boosting power, enabling exploration of atypical penetrance and empowering variant classification.